Version Control Made Easy: Introduction to Git and GitHub
🚧 Work in Progress!
Hey! I’m still working on this post. If you’re interested, keep an eye out for updates here — exciting stuff coming soon!!
You start with a file called script.py. Then it becomes script_final.py. Then script_final2.py. Eventually, you’re looking at script_final_final_really_final.py.
It gets messy fast.
A system that can remember what changed, when, why and by whom — without relying on awkward filenames or endless copies. That’s exactly where Version Control Systems (VCS) come in.
{Source: Google Gemini, height=500px, width= 1000px }
Version Control System (VCS)
helps developers track changes in source code over time. It allows you to:
- Revert to previous versions of code
- Compare changes made over time
- Identify who modified a file and when
- Collaborate with multiple contributors without overwriting each other’s work
- Experiment in branches without affecting the main codebase
Think of it as a magical time machine for your code. You can move back and forth between versions, experiment freely, and maintain a clear record of all changes.
There are two major types of VCS:
- Centralized Version Control Systems (CVCS)
- Distributed Version Control Systems (DVCS)
In this blog, I’m going to focus on DVCS specifically, on Git.
Git is the most widely used Version Control System today. It’s trusted by individual developers, open source communities, and large enterprises alike making it an essential tool in modern software development.
What is Git?
Git is the most popular Distributed Version Control System (DVCS) used by developers across the world. It was created in 2005 by Linus Torvalds, the creator of the Linux kernel, to manage the Linux source code more efficiently.
Unlike centralized systems, Git allows every user to have their own full copy of the repository, including its entire history.
Git Terminology
high-level overview of Git’s core concepts:
- Repository (repo): A directory that contains your project files and a .git folder, which tracks all changes.
- Commit: A snapshot of your files at a certain point in time. Each commit has a unique ID and message describing the change.
- Branch: A separate line of development. The default branch is usually called main or master.
- Merge: Integrates changes from one branch into another.
- Clone: Copy an entire repository from a remote server to your local machine.
- Push / Pull: Push sends your local commits to a remote repository (like GitHub). Pull fetches and integrates changes from remote to local.
Git follows a three-stage architecture:
- Working Tree – Where you make changes.
- Staging Area (Index) – Where you prepare changes for the next commit.
- Repository (.git folder) – Where committed changes are stored permanently.
Why Use Git?
- Speed: Most operations are local and incredibly fast.
- Data Integrity: history of every change is securely stored
- Branching and Merging: Branches are lightweight and encourage experimentation.
- Widespread Adoption: Git is used in open source and enterprise environments alike. Learning Git is an essential skill for any developer or data scientist.
Download and Install
To start using Git, you’ll need to install it on your machine. Here are the official resources:
Download Git
Installation Guide
Once installed, you can open a terminal (or Git Bash on Windows) and run:
git --version
If it returns a version number without any error, Git is ready to use!
Step-by-Step Git Workflow
We are going to use the directory created in our last session (ml_project). please reffer Virtual environments in ML projects if you do not have it already. (You can also create a new directory if you want).
ml_project/
├── data/ # Raw or processed datasets
├── notebooks/ # Jupyter notebooks for EDA
├── src/ # Source code / scripts / modules
├── models/ # Saved model files
├── venv/ # Virtual environment
├── requirements.txt # Package list
Set Your Name and Email (one-time setup)
git config --global user.name "Your Full Name"
git config --global user.email "your.email@example.com"
This configures your name and email and appears in your commit history and helps identify who made each change. Use the same email as your GitHub account to link commits correctly.
Initialize Git in Your Project
Add a .gitkeep or empty .txt file inside each folder so Git can track them even if they’re empty initially:
touch ml_project/{data,notebooks,src,models,outputs,tests}/.gitkeep
Once we have created a project directory we can initialize git by
git init
This turns your folder into a Git repository by creating a hidden .git/ folder. This folder contains everything Git needs to track changes, including:
- Commit history
- Branches
- Staging area (index)
- Configuration settings
It tells Git: “Start tracking this project.” It enables all Git features like git add, git commit, and git push.
To view hidden .git directory you can use
ls -l .git
Only run git init once per project. If you delete the .git/ folder, you lose all version history.
right after initializing git we create two files which are important
echo "# My Project Title" > README.md
touch .gitignore
README.md is a Markdown file that typically serves as the front page of your project.
It should include: - What the project does - How to install/run it - Dependencies - Usage examples and other important project details
.gitignore tells Git which files or folders NOT to track or include in commits. Examples of what to ignore:
- OS/system files (.DS_Store, Thumbs.db)
- IDE/editor files (.vscode/, .idea/)
- Logs or temporary files (.log, .tmp)
- Dependency folders (node_modules/, venv/)
- Sensitive data (.env, credentials.json)
In our example we are going to add “venv/” to .gitignore
Without a .gitignore, Git may track junk or private files that you never meant to upload.
Check the Current Git Status
This shows you: - New/untracked files - Changes that are staged or not staged - What’s ready to commit
git status
You’ll see something like:
““” On branch master
No commits yet
Untracked files: (use “git add
.gitignore
README.md
nothing added to commit but untracked files present (use “git add” to track) ““”
Now, to track these files
Stage the File for Commit
git add README.md
git add .gitignore
Think of git add as preparing the file to be saved (but not saving yet)
Working Directory ──(git add)──▶ Staging Area
Commit the File
git commit -m "Add README and .giignore files"
The -m flag adds a commit message describing what you did. it is always a good practice to give a proper desciption of what changes you have done in this commit. git commit takes a snapshot of your staged changes and permanently stores it in the Git repository.
Why Is git commit Important?
- It records your progress in small, logical steps
- Makes it easier to undo mistakes, collaborate, and track who did what
- Helps in code reviews and project management
Working Directory ──(git add)──▶ Staging Area ──(git commit)──▶ Git Repository
Check the Commit History
git log
git log displays the history of commits in your repository — from the most recent to the oldest.
It shows you: - Who made the commit (name and email) - When it was made - What the commit message was - A unique commit ID (SHA)
Wrapping Up
In this post, you’ve taken your very first steps with Git:
- You initialized a Git repository with git init
- Learned how to track changes with git add and git commit
- Explored how to inspect your project history with git log
- And got familiar with concepts like .gitignore and git status
- You now have a working Git project right on your local machine.
But what if you want to back up your code, access it from anywhere, or collaborate with others?
That’s where GitHub comes in!
What’s Next?
In the next blog, we’ll connect your local Git project to a remote repository on GitHub. You’ll learn:
- Why pushing to GitHub is useful
- How to create a GitHub repo
- How to connect it to your local repo
- How to push your code to the cloud!
These are the foundational steps every developer takes when beginning a new project, this is just the beginning.
Source
Connect with Me
I use AI tools to assist in writing and drafting some of the content on this blog. but all content is reviewed and edited by me for accuracy and clarity.
💬 Comments